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1 From probability theory to Kolmogorov com- 
plexity 



1.1 Randomness and Probability theory 

Quisan{3: I just found a surprising assertion on Leonid Levin's home page: 

While fundamental in many areas of Science, randomness is re- 
ally "native" to Computer Science. 

Common sense would rather consider randomness as intrinsically relevant 
to Probability theory! 

Authors: Levin also adds: "The computational nature of randomness was 
clarified by Kolmogorov. " 

The point is that, from its very origin to modern axiomatization around 
1933 [21] by Andrei Nikolaievitch Kolmogorov (1903-1987), Probability the- 
ory carries a paradoxical result: 

if we toss an unbiaised coin 100 times then 100 heads are just as 
probable as any other outcome! 

As Peter Gacs pleasingly remarks ([iZ], p. 3), this convinces us only that the 
axioms of Probability theory, as developped in J2l]j . do not solve all mysteries 
that they are sometimes supposed to. 

In fact, since Laplace, much work has been devoted to get a mathematical 
theory of random objects, notably by Richard von Mises (1883-1953) (cf. 
§8.2p . But none was satisfactory up to the 60's when such a theory emerged 
on the basis of computability. 

As it sometimes occurs, the theory was discovered by several authors inde- 
pendently! In the USA, Ray J. Solomonoff (b. 1926), 1964 (a paper 
submitted in 1962) and Gregory J. Chaitin (b. 1947), 1966 [5], 1969 [6] 
(both papers submitted in 1965). In Russia, Kolmogorov, 1965 [23], with 
premisses announced in 1963 [22j. 

Q: Same phenomenon as for hyperbolic geometry with Gauss, Lobatchevski 
and Bolyai. I recently read a citation from Bolyai's father: "When the time is 
ripe for certain things, these things appear in different places in the manner 
of violets coming to light in early spring" . 

^Quisani is a student with quite eclectic scientific curiosity, who works under Yuri 
Gurevich's supervision 

^For a detailed analysis of who did what, and when, see |30] p. 89-92. 
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A: Mathematics and poetry. . . Well, pioneered by Kolmogorov, Martin-Lof, 
Levin, Gacs, Schnorr (in Europe) and Chaitin, Solovay (in America), the 
theory developped very fruitfully and is now named Kolmogorov complexity 
or Algorithmic Information Theory. 

Q: So, Kolmogorov founded Probability Theory twice! In the 30's and then 
in the 60's. 

A: Hum. . . In the 30's Kolmogorov axiomatized Probability Theory on the 
basis of measure theory, i.e. integration theory on abstract spaces. In the 
60's, Kolmogorov (and also Solomonoff and Chaitin independently) founded 
a mathematical theory of randomness. That it could be a new basis for 
Probability Theory is not clear. 

Q: What? Randomness would not be the natural basis for Probability 
Theory? 

A: Random numbers are useful in different kinds of applications: simu- 
lations of natural phenomena, sampling for testing "typical case", getting 
good source of data for algorithms, . . . (cf. Donald Knuth, [20], chapter 3). 

However, the notion of random object as a mathematical notion is presently 
ignored in lectures about Probability Theory. Be it for the foundations or for 
the development of Probability Theory, such a notion is neither introduced 
nor used. That's the way it is. . . There is a notion of random variable, but it 
has really nothing to do with random objects. Formally, they are just func- 
tions over some probability space. The name "random variable" is a mere 
vocable to convey the underlying non formalized intuition of randomness. 

Q: That's right. I attended several courses on Probability Theory. Never 
heard anything precise about random objects. And, now that you tell me, I 
realize that there was something strange for me with random variables. 

So, finally, our concrete experience of chance and randomness on which 
we build so much intuition is simply removed from the formalization of 
Probability Theory. 

Hum. . . Somehow, it's as if the theory of computability and programming 
were omitting the notion of program, real programs. 

By the way, isn' it the case? In recursion theory, programs are reduced to 
mere integers: Godel numbers! 

A: Sure, recursion theory illuminates but does not exhaust the subject of 
programming. 

As concerns a new foundation of Probability Theory, it's already quite 
remarkable that Kolmogorov has looked at his own work on the subject with 
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such a distance. So much as to come to a new theory: the mathematization 
of randomness. However, it seems (to us) that Kolmogorov has been am- 
biguous on the question of a new foundation. Indeed, in his first paper on 
the subject (1965, [23j, p. 7), Kolmogorov briefly evoked that possibility : 

... to consider the use of the [Algorithmic Information Theory] 
constructions in providing a new basis for Probability Theory. 

However, later (1983, |25], p. 35-36), he separated both topics 

"there is no need whatsoever to change the established construc- 
tion of the mathematical probability theory on the basis of the 
general theory of measure. I am not enclined to attribute the 
significance of necessary foundations of probability theory to the 
investigations [about Kolmogorov complexity] that I am now go- 
ing to survey. But they are most interesting in themselves. 

though stressing the role of his new theory of random objects for mathemat- 
ics as a whole ([25], p. 39): 

The concepts of information theory as applied to infinite se- 
quences give rise to very interesting investigations, which, with- 
out being indispensable as a basis of probability theory, can ac- 
quire a certain value in the investigation of the algorithmic side 
of mathematics as a whole. " 

Q: All this is really exciting. Please, tell me about this approach to ran- 
domness. 

1.2 Intuition of finite random strings and Berry's paradox 

A: OK. We shall first consider finite strings. 

If you don't mind, we can start with an approach which actually fails but 
conveys the basic intuitive idea of randomness. Well, just for a while, let's 
say that a finite string u is random if there is no shorter way to describe u 
but give the successive symbols which constitute u. Saying it otherwise, the 
shortest description of u is u itself, i.e. the very writing of the string u. 

Q: Something to do with intensionality and extensionality? 

A: You are completely right. Our tentative definition declares a finite string 
to be random just in case it does not carry any intensionality. So that there 
is no description of u but the extensional one, which is u itself. 
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Q: But the notion of description is somewhat vague. Is it possible to be 
more precise about "description" and intensionahty? 

A: Diverse partial formalizations are possible. For instance within any par- 
ticular logical first-order structure. But they are quite far from exhausting 
the intuitive notion of definability. In fact, the untamed intuitive notion 
leads to paradoxes, much as the intuition of truth. 

Q: I presume you mean the liar paradox as concerns truth. As for defin- 
ability, it should be Berry's paradox about 

"the smallest integer not definable in less than eleven words" 
and this integer is indeed defined by this very sentence containing only 10 
words. 

A: Yes, these ones precisely. By the way, this last paradox was first men- 
tioned by Bertrand Russell, 1908 ([38], P-222 or 150) who credited G.G. 
Berry, an Oxford librarian, for the suggestion. 

1.3 Kolmogorov complexity relative to a function 

Q: And how can one get around such problems? 

A: What Solomonoff, Kolmogorov and Chaitin did is a very ingenious move: 
instead of looking for a general notion of definability, they restricted it to 
computability. Of course, computability is a priori as much a vague and 
intuitive notion as is definability. But, as you know, since the thirties, there 
is a mathematization of the notion of computability. 

Q: Thanks to Kurt, Alan and AlonzoH 

A: Hum. . . Well, with such a move, general definitions of a string u are 
replaced by programs which compute u. 

Q: Problem: we have to admit Church's thesis. 

A: OK. In fact, even if Church's thesis were to break down, the theory of 
computable functions would still remain as elegant a theory as you learned 
from Yuri and other people. It would just be a formalization of a proper 
part of computability, as is the theory of primitive recursive functions or 
elementary functions. As concerns Kolmogorov theory, it would still hold 
and surely get extension to such a new context. 

^Kurt Godel (1906-1978), Alan Mathison Turing (1912-1954), Alonzo Church (1903- 
1995). 
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Q: But where do the programs come from? Are you considering Turing 
machines or some programming language? 

A: Any partial computable function A : {0, 1}* {0, 1}* is considered as 
a programming language. The domain of A is seen as a family of programs, 
the value A(p) — if there is any — is the output of program p. As a whole, 
A can be seen both as a language to write programs and as the associated 
operational semantics. 

Now, Kolmogorov complexity relative to A is the function Ka '■ {0, 1}* — > N 
which maps a string x to the length of shortest programs which output x: 

Definition 1. Ka{x) = min{|p| : A{p) = x} 

(Convention: min(0) = +oo, so that Ka{x) = +oo if x is outside the range 
of A). 

Q: This definition reminds me of a discussion I had with Yuri some years 
ago ([19j p. 76-78). Yuri explained me things about Levin complexity. I 
remember it involved time. 

A: Yes. Levin complexity is a very clever variant of K which adds to the 
length of the program the log of the computation time to get the output. 
It's a much finer notion. We shall not consider it for our discussion about 
randomness. You'll find some developments in [30] §7.5. 

Q: There are programs and outputs. Where are the inputs? 

A: We can do without inputs. It's true that functions with no argument 
are not considered in mathematics, but in computer science, they are. In 
fact, since Von Neumann, we all know that there can be as much tradeof 
as desired between input and program. This is indeed the basic idea for 
universal machines and computers. 

Nevertheless, Kolmogorov [23j points a natural role for inputs when consid- 
ering conditional Kolmogorov complexity in a sense very much alike that of 
conditional probabilities. 

To that purpose, consider a partial computable function B : {0, 1}* x 
{0,1}* — > {0,1}*. A pair {p,y) in the domain of B is interpreted as a 
program p together with an input y. And B{p,y) is the output of program 
p on input y. Kolmogorov [23] defines the conditional complexity relative 
to B as the function Kb{ \ ) : {0, 1}* x {0, 1}* — > N which maps strings x, y 
to the length of shortest programs which output x on input y: 

Definition 2. Kb{x \ y) = min{|p| : B{p,y) = x} 
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1.4 Why binary programs? 

Q: Programs should be binary strings? 
A: This is merely a reasonable restriction. 

Binary strings surely have some flavor of machine level programming. But 
this has nothing to do with the present choice. In fact, binary strings just 
allow for a fairness condition. The reason is that Kolmogorov complexity 
deals with lengthes of programs. Squaring or cubing the alphabet divides 
all lengthes by 2 or 3 as we see when going from binary to octal or hex- 
adecimal. So that binary representation of programs is merely a way to get 
an absolute measure of length. If we were to consider programs p written 
in some finite alphabet S, we would have to replace the length \p\ by the 
product \p\ \og{card[T?)) where card(S) is the number of symbols in S. This 
is an important point when comparing Kolmogorov complexities associated 
to diverse programming languages, cf. 12. 1[ 

1.5 What about other possible outputs? 

Q: Outputs should also be binary strings? 

A: Of course not. In Kolmogorov approach, outputs are the finite objects 
for which a notion of randomness is looked for. Binary strings constitute a 
simple instance. One can as well consider integers, or rationals or elements 
of any structure D with a natural notion of comput ability. The modification 
is straightforward: now ^ is a partial computable function A : {0, 1}* — > D 
and Ka : -D — > N is defined in the same way: Ka{x), for x E D, is the 
minimum length of a program p such that f{p) = x. 

2 Optimal Kolmogorov complexity 

2.1 The Invar iance Theorem 

Q: Well, for each partial computable function A : {0, 1}* {0, 1}* (or 
A : {0,1}* D, as you just explained) there is an associated Kolmogorov 
complexity. So, what is the Kolmogorov complexity of a given string? It 
depends on the chosen A. 

A: Now, comes the fundamental result of the theory, the so-called invariance 
theorem. We shall state it uniquely for Kolmogorov complexity but it also 
holds for conditional Kolmogorov complexity. 

Recall the enumeration theorem: partial computable functions can be enu- 
merated in a partial computable way. This means that there exists a partial 
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computable function £^ : N x {0, 1}* — {0, 1}* such that, for every par- 
tial computable function A : {0, 1}* —>■ {0, 1}*, there is some e G N for 
which we have Vp A{p) = E{e,p) (equality means that A(p) and E(e,p) are 
simultaneously defined or not and, if defined, of course they must be equal). 

Q: Wait, wait, I remember the diagonal argument which proves that there is 
no enumeration of functions N — >^ N. It goes through computability. Given 
E : n X {0,1}* {0,1}*, the function A : N ^ N such that A{n) = 
E{n, n) + 1 is different from each one of the function n i— > E(n, e)'s. And if 
E is computable then A is computable. 

So, how can there be an enumeration of computable functions? 

A: There is no computable enumeration of computable functions. The 
diagonal argument you recalled proves that this is impossible. No way. 
But, we are not considering computable functions but partial computable 
functions. This makes a big difference. The diagonal argument breaks down. 
In fact, equality E{n,n) = E{n,n) + 1 is not incoherent: it just insures that 
E{n, n) is not defined! 

Q: Very strange property,indeed. 

A: No, no. It's quite intuitive nowadays, in our world with computers. 
Given a program in some fixed programming language, say language LISP, 
an interpreter executes it. Thus, with one more argument, the simulated 
program, the LISP compiler enumerates all functions which can be computed 
by a LISP program. Now, any partial computable function admits a LISP 
program. Thus, the LISP interpreter gives you a computable enumeration 
of computable functions. 



A: Let's go back to the invariance theorem. We transform E into a one 
argument partial computable function U : {0, 1}* — > {0, 1}* as follows: Set 



(where 0*^ is the string 00... with length e). 

Then, if A : {0, 1}* {0, 1}* is partial computable and cq is such that 
Vp A{p) = E{eo,p), we have 



Q: OK. 



u{onp) 

U{q) 
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Ku{x) = min{|g| : U{q) = x} (definition of i^T^/) 

= min{|0^1p| : C/(0^1p) = x} 

< min{|0'^''lp| : [/(0^°lp) = x} (restriction to e = eg) 

= min{|0'^°lp| : A{p) = x} (e is a code for A) 

= e + l + min{|p| : A{p) = x} 

= e + l + KA{x) (definition of -ftT^) 
Let's introduce useful notations. For f,g : {0, 1}* N, let's write 

f<9 + 0{l) (resp. / = 5 + 0(1)) 

to mean that there exists a constant c such that 

Vx f{x) < g{x) + c (resp. Vx |/(x) — g{x)\ < c) 

i.e. / is smaller than (resp. equal to) g up to an additive constant c. 

What we have just shown can be expressed as the following theorem 
independently obtained by Kolmogorov (1965 [23] p. 5), Chaitin (1966 [5] 
§9-11) and Solomonoff (1964 [42j p. 12, who gives the proof as an informal 
argument). 

Theorem 3 (Invariance theorem). Ku < Ka + 0(1) for any partial com- 
putable A : {0,1}* — > {0,1}*. In other words, up to an additive constant, 
Ku is the smallest one among the Ka 's. 

Thus, up to an additive constant, there is a smallest Ka- Of course, if Ku 
and Ky are both smallest, up to an additive constant, then Ku = Kv+0{1). 
Whence the following definition. 

Definition 4 (Kolmogorov complexity). Kolmogorov complexity K : {0, 1}* 
N is any fixed such smallest (p to an additive constant) function Ku- 

Let's sum up. The invariance theorem means that, up to an additive 
constant, there is an intrinsic notion of Kolmogorov complexity and we can 
speak of the Kolmogorov complexity of a binary string. 

Q: Which is an integer defined up to a constant. . . Somewhat funny. 

A: You witty! Statements that only make sense in the limit occur every- 
where in mathematical contexts. 

Q: Do not mind, I was joking. 

A: In fact, Kolmogorov argued as follows about the constant, [23] p. 6: 
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Of course, one can avoid the indeterminacies associated with 

the [above] constants, by considering particular [. . .functions U], 
hut it is doubtful that this can be done without explicit arbitrari- 
ness. One must, however, suppose that the different "reasonable" 
[above universal functions] will lead to "complexity estimates" 
that will converge on hundreds of bits instead of tens of thou- 
sands. Hence, such quantities as the "complexity" of the text 
of "War and Peace" can be assumed to be defined with what 
amounts to uniqueness. 

Q: Using the interpretation you mentioned a minute ago with program- 
ming languages concerning the enumeration theorem, the constant in the 
invariance theorem can be viewed as the length of a LISP program which 
interprets A. 

A: You are right. 

2.2 Coding pairs of strings 

A: Have you noted the trick to encode an integer e and a string p into a 
string Q^lp ? 

Q: Yes, and the constant is the length of the extra part 0^1. But you have 
encoded e in unary. Why not use binary representation and thus lower the 
constant? 

A: There is a problem. It is not trivial to encode two binary strings u, v 
as a binary string w. We need a trick. But first, let's be clear: encode here 
means to apply a computable injective function {0, 1}* x {0, 1}* {0, 1}*. 
Observe that concatenation does not work: \i w = uv then we don't know 
which prefix of w \s u. A new symbol 2 inserted as a marker allows for an 
encoding: from w = u2v we can indeed recover u, v. However, w is no more 
a binary string. 

A simple solution uses this last idea together with a padding function ap- 
plied to u which allows 1 to become an end-marker. Let pad{u) be ob- 
tained by inserting a new zero in front of every symbol in u. For instance, 
pad{QlQll) = 0001000101. Now, a simple encoding w of strings u,v is the 
concatenation w = pad{u)lv. In fact, the very definition of pad insures that 
the end of the prefix pad{u) in w is marked by the first occurrence of 1 at 
an odd position (obvious convention: first symbol has position 1). Thus, 
from w we get pad{u) — hence u — and u in a very simple way: a finite 
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automaton can do the job! Observe that 



\pad{u)lv\ = 2\u\ + \v\ + 1 



(1) 



Q: Is the constant 2 the best one can do? 

A: No, one can iterate the trick. Instead of padding u, one can pad the 
string \u\ which is the binary representation of the length of u. Look at the 
string w = pad{\u\)luv. The first occurrence of 1 at an odd position tells 
you which prefix of w is pad{\u\) and which suffix is uv. From the prefix 
pad{\u\), you get |u|, hence |n|. From \u\ and the suffix uv, you get u and v. 
Nice trick, isn't it? And since ||u|| = 1 + [log(|ii|)J, we get 



Q: Exciting! One could altogether pad the length of the length. 



A: Sure. |?a(i(||u| |)l|M|ut' is indeed an encoding of u,v. The first occurrence 

of 1 at an odd position tells you which prefix of w is pa(i(||ii||) and which 

suffix is |ii|nf. From the prefix pad(| |m| |) , you get ||m|| hence ||u||. Now, 
from ||n|| and the suffix \u\uv you get |m| — hence \u\ — and uv. From \u\ 
and uv, you get u and v. Also, a simple computation leads to 



\pad{\\u\\)l\u\uv\ = \u\ + \v\ + Llog(|n|)J +2[log(l + Llog(|u|)J)J +3 (3) 



Q: Of course, we can iterate this process. 
A: Right. But, let's leave such refinements. 

2.3 Non determinism 

Q: Our problematic is about randomness. Chance, randomness, arbitrari- 
ness, unreasonned choice, non determinism . . . Why not add randomness to 
programs by making them non deterministic with several possible outputs? 

A: Caution: if a single program can output every string then Kolmogorov 
complexity collapses. In order to get a non trivial theory, you need to restrict 
non determinism. There is a lot of reasonable ways to do so. It happens 
that all lead to something which is essentially usual Kolmogorov complexity 
up to some change of scale ([S], [IE])- Same with the prefix Kolmogorov 
complexity which we shall discuss later. 



\pad{\u\)luv\ = \u\ + \v\ + 2[log(|u|)J + 3 



(2) 
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3 How complex is Kolmogorov complexity? 



Q: Well, let me tell you some points I see about Ka- 

The domain of Ka appears to be the range of A. So that Ka is total in case 
A is onto. 

Since there are finitely many programs p with length < n, there can be only 
finitely many x's such that Ka{x) < n. So that, lim^^^^.^.^ Ka{x) = +00. 

Also, in the definition of KaIx), there are 2 points: 

1) find some program which outputs x, 

2) make sure that all programs with shorter length either do not halt or 

have output different from x. 

Point 2 does not match with definitions of partial computable functions! 

3.1 Approximation from above 

A: Right. In general, Ka is not partial computable. 
Q: So, no way to compute Ka- 

A: Definitely not, in general. However, Ka can be approximated from 
above: 

Proposition 5. Ka is the limit of a computable decreasing sequence of 
functions. 

Moreover, we can take such a sequence of functions with finite domains. 
To see this, fix an algorithm A for A and denote At the partial function 
obtained by applying up to t steps of algorithm A for the sole programs 
with length < t. It is clear that {t,p) 1-^ At{p) has computable graph. Also, 

KAtix) = min{|p| : p G {0, 1}-* and At(p) = x} 
so that (t, x) ^ KAt (x) has computable graph too. 

To conclude, just observe that {t,x) 1— KAt{x) is decreasing in t (with the 

obvious convention that undefined = +00) and that Ka{x) = limt^oo KAt{x). 
The same is true for conditional Kolmogorov complexity Kb{ \ )• 

Q: If Ka is not computable, there should be no computable modulus of 
convergence for this approximation sequence. So what can it be good for? 

A: In general, if a function / : {0, 1}* N can be approximated from above 
by a computable sequence of functions {ft)t&n then xl = {x : f{x) < n} is 
computably enumerable (in fact, both properties are equivalent). Which is 
a very useful property of /. Indeed, such arguments are used in the proof 
of some hard theorems in the subject of Kolmogorov complexity. 
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Q: Could you give me the flavor of what it can be useful for? 

A: Suppose you know that Xn is finite (which is indeed the case for / = Ka) 
and has exactly m elements then you can explicitly get Xn. 

Q: Explicitly get a finite set? what do you mean? 

A: What we mean is that there is a computable function which associates 

to any m, n a code (in whatever modelization of computability) for a partial 

f 

computable function which has range Xn in case m is equal to the number 
of elements in Xn. This is not trivial. We do this thanks to the /j's. 
Indeed, compute the ft{xys for all t's and all x's until you get m different 
strings xi, . . . ,Xm such that /(^(xi), . . . , ftmi^m) are defined and < n for 
some ti,... ,tm. 

That you will get such xi, . . . , Xm is insured by the fact that xl has at least 
m elements and that f{x) = m.m{ ft{x) : t} for all x. 

Since f < ft, surely these x^'s are in xl. Moreover, they indeed constitute 
the whole of xl since xl has exactly m elements. 

3.2 Dovetailing 

Q: You run infinitely many computations, some of which never halt. How 

do you manage them? 

A: This is called dovetailing. You organize these computations (which are 
infinitely many, some lasting forever) as follows: 

— Do up to 1 computation step of ft{x) for < i < 1 and < |x| < 1, 

— Do up to 2 computation steps of ftix) for < t < 2 and < |a:| < 2, 

— Do up to 3 computation steps of ft{x) for < i < 3 and < |x| < 3, 

Q: Somehow looks like Cantor's enumeration of as the sequence 
(0, 0) (0, 1) (1, 0) (0, 2) (1, 1) (2, 0) (0, 3) (1, 2) (2, 1) (3, 0) . . . 

A: This is really the same idea. Here, it would rather be an enumeration a 
la Cantor of N^. When dealing with a multi-indexed family of computations 
((/9i(x))j- , you can imagine computation steps as tuples of integers {i,t,x) 
where i denotes the rank of some computation step of ff{x) (here, these 
tuples are triples). Dovetailing is just a way of enumerating all points in a 
discrete multidimensional space N*^ via some zigzagging a la Cantor. 

Q: Well, Cantor wandering along a broken line which fills the discrete plane 
here becomes a way to sequentialize parallel computations. 
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3.3 Undecidability 



Q: Let's go back to Ka- If A is onto then Ka is total. So that if A is 
also computable then Ka should be computable too. In fact, to get Ka{x) 
just compute all A{p)^s, for increasing |p|'s until some has value x: this will 
happen since A is onto. 

You said Ka is in general undecidable. Is this undecidability related to the 
fact that A be partial computable and not computable? 

A: No. It's possible that Ka be quite trivial with A as complex as you want. 
Let / : {0, 1}* ^ N be any partial computable function. Set A{Ox) = x and 
yi(li+/(^)Ox) = X. Then, A is as complex as / though Ka is trivial since 

Ka{x) = \x\ + 1, as is easy to check. 

Q: Is it hard to prove that some Ka is indeed not computable? 

A: Not that much. If U : {0, 1}* {0, 1}* is optimal then we can show 
that Ku is not computable. Thus, K (which is Ku for some fixed optimal 
U) is not computable. 

And this is where Berry's paradox comes back. Consider the length-lexicographic 
order on binary strings: u <hier v if and only if \u\ < \v\ or \u\ = \v\ and u 
is lexicographically before v. 

Now, look, we come to the core of the argument. The key idea is to 
introduce the function T : N — >^ {0, 1}* defined as follows: 



As you see, this function is nothing but an implementation of the very 
statement in Berry's paradox modified according to Kolmogorov's move from 
definability to computability via the function A. Clearly, we have 



Suppose, by way of contradiction, that Ku is computable. Then so is T and 
so is the function V : {0, 1}* {0, 1}* such that V{p) = T{Val2{lp)) where 
Val2{lp) is the integer with binary representation Ip. 
Now, if z > has binary representation Ip then T{i) = V{p), so that 



T{i) = the <hier smallest x such that Ku(x) > i 



Ku{T{i)) > i 



(4) 



Kv{T{i}) < \p\ = [log(i)J 



(5) 



The invariance theorem insures that, for some c, we have 



Ku <Kv + c 



(6) 
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From inequalities dH), (l5|), ([6]) we get 

i < Ku(T{i)) < Kv{T{i)) + c < log(i) + c 

which is a contradiction for i large enough since limj^_|-oo = 0. 
Thus, our assumption that Ku be computable is false. 

3.4 No non trivial computable lower bound 

Q: Quite nice an argument. 

A: Can get much more out of it: 

Theorem Q. 1) No restriction of Kjj to an infinite computable set is com- 
putable. 

2) Worse, if X CI {0, 1}* is computable and f : X ^ N is a computable 
function and f{x) < Kij{x) for all x £ X then f is bounded! 

To prove this, just change the above definition of T : N ^ {0, 1}* as follows: 

T{i) = the <hier smallest x £ X such that f{x) > i 
Clearly, by definition, we have f{T{i)) > i. Since T[i) G X and f{x) < 
Ku{x) for X £ X, this implies equation (j3|) above. Also, / being computable, 
so are T and V , and equation ([5]) still holds. As above, we conclude to a 
contradiction. 

Let's reformulate this result in terms of the greatest monotonous (with re- 
spect to <hier) lower bound of Ku which is 

m{x) = mmy>^.^^^Ku{y) 
This function m is monotonous and tends to +oo but it does so incredibly 
slowly: on any computable set it can not grow as fast as any unbounded 
computable function. 

3.5 Kolmogorov complexity and representation of objects 

Q: You have considered integers and their base 2 representations. Complex- 
ity of algorithms is often much dependent on the way objects are represented. 
Here, you have not be very precise about representation of integers. 

A: There is a simple fact. 

Proposition 7. Let f : {0, 1}* — > {0, 1}* be partial computable. 

1) K{f[x)) < K(x) + 0(1) for every x in the domain of f. 

2) If f is also injective then K{f{x)) = K{x) + 0(1) for x G domain{f) . 
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Indeed, denote U some fixed universal function sucli tliat K = Kjj. 

To get a program wliich outputs f{x), we just encode a program vr computing 

/ togetfier with a program p outputting x. 

Formally, let A : {0, 1}* —>■ {0, 1}* be such that A{pad{7:)lz) is the output 
of TT on input U{z) for all z £ {0,1}*. Clearly, A{pad{n)lp) = f{x) so 
that KA{f{x)) < 2\tt\ + \p\ + 1. Taking p such that K{x) = \p\, we get 
KaU{x)) < + 2|7r| + 1. 

The Invariance Theorem insures that K{f{x)) < KA{f{x)) + 0(1), whence 
point 1 of the Proposition. 

In case / is injective, it has a partial computable inverse g with domain the 
range of /. Applying point 1 to / and g we get point 2. 

A: So all representations of integers lead to the same Kolmogorov complex- 
ity, up to a constant. 

A: Yes, as long as one can computably go from one representation to the 
other one. 

4 Algorithmic Information Theory 
4.1 Zip/Unzip 

Q: A moment ago, you said the subject was also named Algorithmic Infor- 
mation Theory. Why? 

A: Well, you can look at K{x) as a measure of the information contents 
that X conveys. The notion can also be vividly described using our everyday 
use of compression/decompression software (cf. Alexander Shen's lecture 
|40j ) . First, notice the following simple fact: 

Proposition 8. K{x) < \x\ + 0(1) 

Indeed, let A(x) = x. Then Ka{x) = \x\ and the above inequality is a mere 
application of the Invariance Theorem. 

Looking at the string x as a file, any program p such that U{p) = x can 
be seen as a compressed file for x (especially in case the right member in 
Proposition E]) is indeed < . . ). 

So, U appears as a decompression algorithm which maps the compressed 
file p onto the original file x. In this way, K{x) measures the length of the 
shortest compressed files for x. 

What does compression? It eliminates redundancies, explicits regularities 
to shorten the file. Thus, maximum compression reduces the file to the core 
of its information contents which is therefore measured by K{x). 
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4.2 Some relations in Algorithmic Information Theory 

Q: OK. And what does Algorithmic Information Theory look like? 

Q: Conditional complexity should give some nice relations as is the case 
with conditional probability. 

A: Yes, there are relations which have some probability theory flavor. How- 
ever, there are often logarithmic extra terms which come from the encoding 
of pairs of strings. For instance, an easy relation: 

K{x) < K{x I y) + K{y) + 2 log(min(i^(x | y),K{y))) + 0(1) (7) 

The idea to get this relation is as follows. Suppose you have a program p 
(with no parameter) which outputs y and a program q (with one parameter) 
which on input y outputs x, then you can mix them to get a (no parameter) 
program which outputs x. 

Formally, Suppose that p,q are optimal, i.e. K[y) = \p\ and K{x \ y) = \q\. 
Let ^1,^2 : {0, 1}* {0, 1}* be such that 

Ai{pad{\z\)lzw) = A2{pad{\w\)lzw) = V{w,U{z)) 
where V denotes some fixed universal function such that K{\) = Ky{ \ ). 
It is clear that Ai{pad{\p\)lpq) = A2{pad{\q\)lpq) = x, so that 
KaAx) < H + k|+21og(|p|)+0(l) 
Ka,{x) < |p| + |g|+21og(|(?|) + 0(l) 
whence, p, q being optimal programs, 

Ka, (x) < K{y) + K{x\y) + 2 \og{K{y)) + 0(1) 
Ka^ {x) < K{y) + K{x\y) + 2 \og{K{x \ y)) + 0{l) 
Applying the Invariance Theorem, we get ([7]). 

4.3 Kolmogorov complexity of pairs 

Q: What about pairs of strings in the vein of the probability of a pair of 
events? 

A: First, we have to define the Kolmogorov complexity of pairs of strings. 
The key fact is as follows: 

Proposition 9. If f,g : {0, 1}* x {0, 1}* —>■ {0, 1}* are encodings of pairs of 
strings (i.e. computable injections), then K{f{x,y)) = K{g{x,y)) +0(1). 

As we always argue up to an additive constant, this leads to: 

Definition 10. The Kolmogorov complexity of pairs is K{x, y) = K(f{x, y)) 
where / is any fixed encoding. 
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To prove Proposition [9l observe that fog^^ is a partial computable injection 
such that / = {fog^^)og- Then, apply Proposition [7] with argument g(x,y) 
and function f o g~^. 

4.4 Symmetry of information 

A: Relation ([7]) can be easily improved to 

K{x, y) < K{x I y) + K{y) + 2 log(min(i^(:r | y),K{y))) + 0(1) (8) 

The same proof works. Just observe that from both programs pad{\p\)\pq 
and pad{\q\)lpq one gets q hence also y. 

Now, dS]) can be considerably improved: 

Theorem 11. \K{x, y) - K{x \ y) - K{y)\ = 0(log(K(x, y)) 

This is a hard result, independently obtained by Kolmogorov and Levin 
around 1967 ([Sll, [SQ] p. 117). We better skip the proof (you can get it in 
[ID] p. 6-7 or P] Thm 2.8.2 p. 182-183). 

Q: I don't really see the meaning of that theorem. 

A: Let's restate it in another form. 

Definition 12. I{x : y) = K{y) — K(y \ x) is called the algorithmic infor- 
mation about y contained in x. 

This notion is quite intuitive: you take the difference between the whole 
information contents of y and that when x is known for free. 
Contrarily to what was expected in analogy with Shannon's classical infor- 
mation theory, this is not a symmetric function However, up to a logarithmic 
term, it is symmetric: 

Corollary 13. \I{x : y) — I{y : x)\ = 0{log{K{x,y)) 

For a proof, just apply Theorem [11] with K{x,y) and K(y,x) and observe 
that K{x,y) = K{y,x) + 0(1) (use Proposition [9]). 

5 Kolmogorov complexity and Logic 
5.1 What to do with paradoxes 

Q: Somehow, Solomonoff, Kolmogorov and Chaitin have built up a theory 
from a paradox. 
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A: Right. In fact, there seems to be two mathematical ways towards para- 
doxes. The most natural one is to get rid of them by building secured and 
delimited mathematical frameworks which will leave them all out (at least, 
we hope so. . . ). Historically, this was the way followed in all sciences. A sec- 
ond way, which came up in the 20th century, somehow integrates paradoxes 
into scientific theories via some clever and sound (!) use of the ideas they 
convey. Kolmogorov complexity is such a remarkable integration of Berry's 
paradox into mathematics. 

Q: As Godel did with the liar paradox which is underlying his incomplete- 
ness theorems. Can we compare these paradoxes? 

A: Hard question. The liar paradox is about truth while Berry's is about 
definability. Viewed in computational terms, truth and definability somehow 
correspond to denotational and operational semantics. 
This leads to expect connections between incompleteness theorems a la 
Godel and Kolmogorov investigations. 

5.2 Chaitin Incompleteness results 

Q: So, incompleteness theorems can be obtained from Kolmogorov theory? 

A: Yes. Gregory Chaitin, 1971 [7], pointed and popularized a simple but 
clever and spectacular application of Kolmogorov complexity (this original 
paper by Chaitin did not consider K but the number of states of Turing 
machines, which is much similar). 

Let T he a computable theory containing Peano arithmetic such that all 
axioms of T are true statements. 

Theorem 14. There exists a constant c such that if T proves K{x) > n 
then n < c. 

The proof is by way of contradiction and is a redo of the undecidability of 
Ktt ,. . Suppose that T can prove statements K(x) > n for arbitrarily 

PASCAL*"" '■ V / — J 

large n's. Consider a computable enumeration of all theorems of T and let 
/ : N — > N be such that f{n) is the first string x such that K{x) > n 
appears as a theorem of T. Our hypothesis insures that / is total, hence a 
computable function. By very definition, 



K{f{n) > n 



(9) 



Also, applying Propositions [7] and [8] we get 



K{f{n)) < K{n) + 0(1) < log(n) + 0(1) 



(10) 
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whence n < log(n) + 0(1), which is a contradiction if n is large enough. 

Q: Quite nice. But this does not give any exphcit statement. How to 
compute the constant c ? How to get any exphcit x's such that K{x) > c ? 

A: Right. Hum. . .you could also see this as a particularly strong form of 

incompleteness: you have a very simple infinite family of statements, only 
finitely many can be proved but you don't know which ones. 

5.3 Logical complexity of K 

A: By the way, there is a point we should mention as concerns the logical 
complexity of Kolmogorov complexity. 

Since K is total and not computable, its graph can not be computably 
enumerable (c.e.). However, the graph of any Ka (hence that of K) is always 
of the form Rf] S where R is c.e. and S is co-c.e. (i.e. the complement of 
an c.e. relation). 

We can see this as follows. Fix an algorithm P for A and denote the 
partial function obtained by applying up to t computation steps of this 
algorithm. Then 

Ka{x) <n^3t{3pe {0, 1}^" A\p) = x) 
The relation within parentheses is computable in t, n, x, so that Ka{x) < n 
is c.e. in n, x. 

Replacing n by n — 1 and going to negations, we see that Ka{x) > n is 
co-c.e. Since Ka{x) = n {Ka{x) < n) A {Ka{x) > n), we conclude that 
KAix) = n is the intersection of an c.e. and a co-c.c relations. 
In terms of Post's hierarchy, the graph of Ka is ^ hence The 
same with Ksi \ )• 

Q: Would you remind me about Post's hierarchy? 

A: Emil Post introduced families of relations R{xi . . . Xm) on strings and/or 
integers. Let's look at the first two levels: 

S5 and H? are the respective families of c.e. and co-c.e. relations, 

S2 is the family of projections of H^ relations, 

n2 consist of complements of E2 relations. 
Notations S° and H^ come from the following logical characterizations: 

R{x) is S? if R{x) <^ 3ti . . . 3tk T{t, x) with T computable. 

R{x) is S2 if R{x) 44> 3t Vn T(t, u, x) with T computable. 

H^ and n2 are defined similarly with quantifications V and V3. 
Each of these families is closed under union and intersection. But not under 
complementation since S° and H? are so exchanged. 
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A last notation: denotes S° n 11°. In particular, means c.e. and 
co-c.e. hence computable. 

As for inclusion, A2 strictly contains the boolean closure of S^, in particular 
it contains U H^. This is why the term hierarchy is used. 

Also, we see that Ka is quite low as a A2 relation since Sj* A Hi is the 
very first level of the boolean closure of S^. 

6 Random finite strings and their applications 

6.1 Random versus how much random 

Q: Let's go back to the question: "what is a random string?" 

A: This is the interesting question, but this will not be the one we shall 
answer. We shall modestly consider the question: "To what extent is x 
random?" 

We know that K{x) < \x\ + 0(1). It is tempting to declare a string x 
random if K{x) > \x\ — 0(1). But what does it really mean? The 0(1) 
hides a constant. Let's explicit it. 

Definition 15. A string is called c-incompressible (where c > is any 
constant) if K{x) > |x| — c. Other strings are called c-compressible. 
0-incompressible strings are also called incompressible. 

Q: Are there many c-incompressible strings? 

A: Kolmogorov noticed that they are quite numerous. 

Theorem 16. For each n the proportion of c-incompressible among strings 
with length n is > 1 — 2~'^. 

For instance, if c = 4 then, for any length n, more than 90% of strings are 
4-incompressible. With c = 7 and c = 10 we go to more than 99% and 
99.9%. 

The proof is a simple counting argument. There are 1 + 2+2^ + - • • + 2"'~'^~^ = 
2"^"^ — 1 programs with length < n — c. Every string with length n which 
is c-compressible is necessarily the output of such a program (but some of 
these programs may not halt or may output a string with length 7^ n). Thus, 
there are at most 2"'"'^ — 1 c-compressible strings with length n, hence at 
least 2" — (2"^^^ — 1) = 2" — 2'^~'^ + 1 c-incompressible strings with length n. 
Whence the proportion stated in the theorem. 
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Q: Are c-incompressible strings really random? 

A: Yes. Martin-Lof, 1965 ^34j, formalized the notion of statistical test and 
proved that incompressible strings pass all these tests (cf. ^8.3|) . 

6.2 Applications of random finite strings in computer science 

Q: And what is the use of incompressible strings in computer science? 

A: Roughly speaking, incompressible strings are strings without any form 
of local or global regularity. Consideration of such objects may help almost 
anytime one has to show something is complex, for instance a lower bound 
for worst or average case time/space complexity. The accompanying key 
tool is Proposition [7l 

And, indeed, incompressible strings have been successfully used in such con- 
texts. An impressive compilation of such applications can be found in Ming 
Li and Paul Vitanyi's book (^30j, chapter 6), running through nearly 100 
pages! 

Q: Could you give an example? 

A: Sure. The very first such application is quite representative. It is due to 
Wolfgang Paul, 1979 [37] and gives a quadratic lower bound on the computa- 
tion time of any one-tape Turing machine M which recognizes palindromes. 

Up to a linear waste of time, one can suppose that M always halts on its 
first cell. 

Let n be even and xx^ = xiX2 ■ ■ ■ x„_ia;„x„x„_i . . . X2X1 be a palindrome 
written on the input tape of the Turing machine A4. 

For each i < n let C Si be the crossing sequence associated to cell i, i.e. the 
list of successive states of M when its head visits cell i. 

Key fact: string X1X2 . . . Xj is uniquely determined by CSi. 

I.e. X1X2 . . . Xj is the sole string y such that — relative to an A^-computation 

on some palindrome with prefix y — , the crossing sequence on cell \y\ is CSi. 

This can be seen as follows. Suppose y ^ X1X2 . . . Xi leads to the same 
crossing sequence CSi on cell \y\ for an A^-computation on some palindrome 
yzz^y^. Run Ai on input yxi^i . . Consider the behaviour of Ai 

while the head is on the left part y. This behaviour is exactly the same as 
that for the run on input yzz^y^ because the sole useful information for Ai 
while scanning y comes from the crossing sequence at cell \y\. In particular, 
A4 - which halts on cell 1 — accepts this input yxj+i . . . x„x^. But this is 
not a palindrome! Contradiction. 
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Observe that the way xiX2---Xi is uniquely determined by CSi is quite 
complex. But we don't care about that. It will just charge the 0(1) constant 
in ([II]). 

Using Proposition [7| with the binary string associated to CSi which is c 
times longer (where c = [IQI], \Q\ being the number of states), we see that 

K{xiX2...x,) <c\Ti\ + 0{l) (11) 

li i > ^ then X1X2 ■ ■ - ^^ is uniquely determined by the pair {xiX2 . . .Xi,^). 
Hence also by the pair (CSi,^). Since the binary representation of ^ uses 
< log(n) bits, this pair can be encoded with 2c\CSi\ +log(n) + 1 bits. Thus, 

K{xiX2 . . . xn) < 2c\CSi\ + log(n) + 0(1) (12) 

Now, let's sum equations ()12p for i = ^, . . . ,n. Observe that the sum of the 
lengthes of the crossing sequences CS^, . . . , CSn is at most the number T 
of computation steps. Therefore, this summation leads to 

-K{xiX2 ...xn)<2cT+- log(n) + O(-) (13) 

Now, consider string such that xiX2...xnL is incompressible, i.e. 

K{xiX2 ■ ■ ■ x^) > §. Equation (fT3]) leads to 

(^)2<2cr + ^log(n) + 0(^) (14) 

whence T > 0{n?). Since the input xx^ has length 2n, this proves the 
quadratic lower bound. QED 

7 Prefix complexity 

7.1 Self delimiting programs 

Q: I heard about prefix complexity. What is it? 

A: Prefix complexity is a very interesting variant of Kolmogorov complexity 
which was introduced around 1973 by Levin ^28| and, independently, by 
Chaitin [8j. 

The basic idea is taken from some programming languageswhich have an 
explicit delimiter to mark the end of a program. For instance, PASCAL 
uses ^^end." . Thus, no program can be a proper prefix of another program. 

Q: This is not true with PROLOC programs: you can always add a new 
clause. 
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A: To execute a PROLOG program, you have to write down a query. And 
the end of a query is marked by a full stop. So, it's also true for PROLOG. 

Q: OK. However, it's not true for C programs nor LISP programs. 

A: Hum. . . You are right. 

7.2 Chaitin-Levin prefix complexity 

A: Let's say that a set X of strings is prefix-free if no string in X is a proper 
prefix of another string in X. A programming language A : {0, 1}* — > {0, 1}* 
is prefix if its domain is a prefix- free set. 

Q: So the programming language PASCAL that you seem to be fond of is 
prefix. 

A: Sure, PASCAL is prefix. 

A: But, what's new with this special condition? 

A: Kolmogorov Invariance Theorem from ^2.11 goes through with prefix 
programming languages, leading to the prefix variant H of K. 

Theorem 17 (Invariance theorem). There exists a prefix partial computable 
function C/W*^ ; {q, 1}* ^ {0,1}*. such that K^prefr. < Ka + 0(1) for 
any prefix partial computable function A : {0, 1}* {0, 1}*. In other words, 
up to an additive constant, Kjjprefix is the smallest one among the Ka 's. 

Definition 18 (Prefix Kolmogorov complexity). Prefix Kolmogorov com- 
plexity H : {0, 1}* ^ N is any fixed such function Kjjpreftx . 

7.3 Comparing K and H 

Q: How does H compare to K 7 
A: A simple relation is as follows: 

Proposition 19. K{x) - 0(1) < H{x) < K{x) + 2\og{K{x)) + 0(1) 
Idem with K{ \ ) and H{ \ ). 

The first inequality is a mere application of the Invariance Theorem for 
K (since f7P^<=/»^ is a programming language). To get the second one, we con- 
sider a programming language U such that K = Kjj and construct a prefix 
programming language U' as follows: the domain of U' is the set of strings 
of the form pad{\p\)lp and U' {pad{\p\)lp) = U{p). By very construction, 
the domain of U' is prefix- free. Also, Kjji{x) = Ku{x) + 2\og{Ku{x)) + 1. 



25 



An application of the Invariance Theorem for H gives the second inequahty 
of the Proposition. 

This inequahty can be improved. A better encoding leads to 

H{x) < K{x) + log(i^(x)) + 2 log \og{K{x)) + 0(1) 

Sharper relations have been proved by Solovay, 1975 (unpublished [33], cf. 
also [30] p. 211): 

Proposition 20. H[x) = K{x) + K{K{x)) + 0{K{K{K{x)))) 
K{x) = H{x) - H{H{x)) - 0{H{H{H{x)))) 

7.4 How big is H ? 

Q: How big is H ? 

A: K and H behave in similar ways. Nevertheless, there are some differ- 
ences. Essentially a logarithmic term. 

Proposition 21. H{x) < \x\ +21og(|x|) +0(1) 

To prove it, apply the H Invariance Theorem to the prefix function 

A{pad{\x\)lx) = X. 
Of course, it can be improved to 

H{x) < \x\ +log(|3;|) + 21oglog(|x|) + 0(1) 

Q: How big can be H{x) — \x\ ? 

A: Well, to get a non trivial question, we have to fix the length of the x's. 
The answer is not a simple function of \x\ as expected, it does use H itself: 

max,,,=„(ii-(x) - |x|) = H{\x\) + 0(1) 

Q: How big can be H{x) — K[x) ? 

A: It can be quite large: 

K{x) < \x\ — log(|x|) < |x| < H{x) 
happens for arbitrarily large x's ( |30j Lemma 3.5.1 p. 208). 

7.5 Convergence of series and the Coding Theorem 

Q: What's so really special with this prefix condition? 

A: The possibility to use Kraft's inequality. This inequality tells you that 
if Z is a prefix-free set of strings then Spg^2^IPl < 1. 

Kraft's inequality is not hard to prove. Denote lu the set of infinite strings 
which admit u as prefix. Observe that 
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1) 2-\p\ is the probability of /„. 

2) If u,v are prefix incomparable then 1^ and 1^ are disjoint. 

3) Since Z is prefix, the /^'s, u £ Z are pairwise disjoint and their union 
has probability $]pg22~l^'l < 1 

The Kji{xys are lengthes of distinct programs in a prefix set (namely, the 
domain of A). So, Kraft's inequality implies 

In fact, H satisfies the following very important property, proved by Levin 
|29j (which can be seen as another version of the Invariance Theorem for 
H): 

Theorem 22 (Coding Theorem). Up to a multiplicative factor, is 
maximum among functions F : {0, 1}* R such that S^.g|o^i}*-F(x) < +00 
and which are approximahle from below (in a sense dual to that in i.e. 
the set of pairs {x,q) such that q is rational and q < F{x) is c.e.). 

8 Random infinite sequences 

8.1 Top-down approach to randomness of infinite sequences 

Q: So, we now come to random infinite sequences. 

A: It happens that there are two equivalent ways to get a mathematical 
notion of random sequences. We shall first consider the most natural one, 
which is a sort of "top-down approach" . 

Probability laws tell you that with probability one such and such things 
happen, i.e. that some particular set of sequences has probability one. A 
natural approach leads to consider as random those sequences which satisfy 
all such laws, i.e. belong to the associated sets (which have probability one). 

An easy way to realize this would be to declare a sequence to be random 
just in case it belongs to all sets (of sequences) having probability one or, 
equivalently, to no set having probability zero. Said otherwise, the family 
of random sequences would be the intersection of all sets having probability 
one, i.e. the complement of the union of all sets having probability zero. 
Unfortunately, this family is empty! In fact, let r be any sequence: the 
singleton set {r} has probability zero and contains r. 

In order to maintain the idea, we have to consider a not too big family 
of sets with probability one. 

Q: A countable family. 
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A: Right. The intersection of a countable family of set with probability 
one will have probability one. So that the set of random sequences will have 
probability one, which is a much expected property. 



8.2 Frequency tests and von Mises random sequences 

A: This top-down approach was pionneered by Richard von Mises in 1919 
([is], [i9] ) who insisted on frequency statistical tests. He declared an infinite 
binary sequence 00^102 ... to be random (he used the term Kollektiv) if the 
frequence of I's is "everywhere" fairly distributed in the following sense: 

i) Let Sn be the number of I's among the first n terms of the sequence. 
Then lim„^oo ^ = ^■ 

ii) The same is true for every subsequence a„(,+ia„j+ia„2+i • • • where no, ni, n2 . 
are the successive integers n such that (jy^aoai . . . an) = 1 where (j) is an "ad- 
missible" place-selection rule. 

What is an "admissible" place-selection rule was not definitely settled by 
von Mises. Alonzo Church, 1940, proposed that admissibility be exactly 
computability. 

It is not difficult to prove that the family of infinite binary sequence 
satisfying the above condition has probability one for any place-selection 
rule. Taking the intersection over all computable place-selection rules, we 
see that the family of von Mises-Church random sequences has probability 
one. 

However, von Mises-Church notion of random sequence is too large. 
There are probability laws which do not reduce to tests with place-selection 
rules and are not satisfied by all von Mises-Church random sequences. As 
shown by Jean Ville, [U] 1939, this is the case for the law of iterated log- 
arithm. This very important law (due to A. I. Khintchin, 1924) expresses 
that with probabililty one 

S* 5* 

lim sup — " = 1 and lim inf — " = — 1 

n~*+oo Y^21oglog(n) ^21oglog(n) 

where S* = (cf. Wilham Feller's book [Hj, p. 186, 204-205). 

V 4 

Q: Wow! What do these equations mean? 

A: They are quite meaningful. The quantities ^ and are the expec- 
tation and standard deviation of Sn- So that, 5* is obtained from Sn by 
normalization: Sn and S* are linearly related as random variables, and S'j^'s 
expectation and standard deviation are and 1. 
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Let's interpret the limsup equation, the other one being similar (in fact, it 
can be obtained from the first one by symmetry). 

Remember that Hm sup„^_|_oo fn is obtained as fohows. Consider the se- 
quence Vn = sup^>„ fm- The bigger is n the smaller is the set {m : m > re }. 
So that the sequence Vn decreases, and limsup„^^oo fn is its limit. 
The law of iterated logarithm tells you that with probabililty one the set 
{re : 5* > Ay^2 log log (re)} is finite in case A > 1 and infinite in case A < 1. 

Q: OK. 

A: More precisely, there are von Mises-Church random sequences which 
satisfy > ^ for all re, a property which is easily seen to contradict the 
law of iterated logarithm. 

Q: So, von Mises' approach is definitely over. 

A: No. Kolmogorov, 1963 [22], and Loveland, 1966 ^31j, independently 
considered an extension of the notion of place-selection rule. 

Q: Kolmogorov once more. . . 

A: Indeed. Kolmogorov allows place-selection rules giving subsequences 
proceeding in some new order, i.e. mixed subsequences. The associated 
notion of randomness is called Kolmogorov stochastic randomness (cf. [26] 
1987). Since there are more conditions to satisfy, stochastic random se- 
quences form a subclass of von Mises-Church random sequences. They con- 
stitute, in fact, a proper subclass ([3T]). 

However, it is not known whether they satisfy all classical probability laws. 
8.3 Martin-L6f random sequences 

Q: So, how to come to a successful theory of random sequences? 
A: Martin-Lof found such a theory. 

Q: That was not Kolmogorov? The same Martin-Lof you mentioned con- 
cerning random finite strings? 

A: Yes, the same Martin-Lof, in the very same paper [33] in 1965. Kol- 
mogorov looked for such a notion, but it was Martin-Lof, a Swedish mathe- 
matician, who came to the pertinent idea. At that time, he was a pupil of 
Kolmogorov and studied in Moscow. Martin-Lof made no use of Kolmogorov 
random finite string to get the right notion of infinite random sequence. 
What he did is to forget about the frequency character of computable sta- 
tistical tests (in von Mises-Church notion of randomness) and look for what 
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could be the essence of general statistical tests and probability laws. Which 
he did both for finite strings and for infinite sequences. 

Q: Though intuitive, this concept is rather vague! 

A: Indeed. And Martin-Lof's analysis of what can be a probability law is 
quite interesting. 

To prove a probability law amounts to prove that a certain set X of se- 
quences has probability one. To do this, one has to prove that the exception 
set — which is the complement Y = {0, 1}^ \ X — has probability zero. 
Now, in order to prove that Y C {0, l}"^ has probability zero, basic measure 
theory tells us that one has to include Y in open sets with arbitrarily small 
probability. I.e. for each n G N one must find an open set Un ^ Y which 
has probability < 

If things were on the real line R we would say that Un is a countable 
union of intervals with rational endpoints. 

Here, in {0, 1}^, Un is a countable union of sets of the form /„ = n{0, 1}^ 
where u is a finite binary string and is the set of infinite sequences which 
extend u. Well, in order to prove that Y has probability zero, for each 
n G N one must find a family {un^m)meN such that Y C (J^ Iu„,m ^'^d 
ProbailJ^Iu^ J < ^ for each n G N. 

And now Martin-L6f makes a crucial observation: mathematical proba- 
bility laws which we can consider necessarily have some effective character. 
And this effectiveness should reflect in the proof as follows: 

the doubly indexed sequence ('Un,m)n,meN is computable. 

Thus, the set Iu„ „ is a computahly enumerable open set and f]^ 
is a countable intersection of a computahly emjmerable family of open sets. 

Q: This observation has been checked for proofs of usual probability laws? 

A: Sure. Let it be the law of large numbers, that of iterated logarithm. . . In 

fact, it's quite convincing. 

Q: This open set (Jm-^«n,m could not be computable? 

A: No. A computable set in {0, 1}^ is always a finite union of luS. 

Q: Why? 

A: What does it mean that Z C {0, 1}^ is computable? That there is some 
Turing machine such that, if you write an infinite sequence a on the input 
tape then after finitely many steps, the machine tells you if a is in .Z or 
not. When it does answer, the machine has read but a finite prefix n of a. 
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so that it gives the same answer if a is replaced by any In fact, 

an appHcation of Konig's lemma (which we shall not detail) shows that we 
can bound the length of such a prefix u. Whence the fact that Z is a finite 
union of I^s. 

Q: OK. So, we shall take as random sequences those sequences which are 
outside any set which is a countable intersection of a computably enumerable 
family of open sets and has probability zero. 

A: This would be too much. Remember, Proba{[J^ Iun,m) ^ 2^- Thus, the 
way the probability of IJ^ Iu„ „ tends to is computably controlled. 

So, here is Martin-Lof's definition: 

Definition 23. A set of infinite binary sequences is constructively of prob- 
ability zero if it is included in |J^ lu^.m where (m, n) i-^ Un^m is a partial 
computable function {0, 1}* such that Proha{\J^Iu^ ^) < ^ for all 

n. 

And now comes a very surprising theorem (Martin-Lof, 1966): 

Theorem 24. There is a largest set of sequences (for the inclusion ordering) 
which is constructively of probability zero. 

Q: Largest? up to what? 

A: Up to nothing. Really largest set: it is constructively of probability zero 
and contains any other set constructively of probability zero. 

Q: How is it possible? 

A: Via a diagonalization argument. The construction has some techni- 
calities but we can sketch the ideas. From the well-known existence of 
universal c.e. sets, we get a computable enumeration ((Oi,j)i)j of sequences 
of c.e. open sets. A slight transformation allows to satisfy the inequality 
Proba{Oi) < i. Now, set Uj = |Jg Oe,e+j+i (here lies the diagonalization!) 
Clearly, Proba{Oj) < Ee 2^ 

— — 27 1 so that U j is constructively of 
probability zero. Also, Uj 5 Ojj- for all j >i whence {{^- Uj) D Oij). 

Q: So, Martin-Lof random sequences are exactly those lying in this largest 
set. 

A: Yes. And all theorems in probability theory can be strengthened by 
replacing "with probability one" by "for all Martin-Lof random sequences" 
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8.4 Bottom-up approach to randomness of infinite sequences: 
Martin-L6f 's Large oscillations theorem 

Q: So, now, what is the bottom-up approach? 

A: This approach looks at the asymptotic algorithmic complexity of the 
prefixes of the infinite binary sequence ooaia2 • • •) namely the K(aQ . . . an)'s. 

The next theorem is the first significative result relevant to this approach. 
Point 2 is due to Albert Meyer and Donald Loveland, 1969 ^32j p. 525. 
Points 3,4 are due to Gregory Chaitin, 1976 [9]. (Cf. also [30] 2.3.4 p. 124). 

Theorem 25. The following conditions are equivalent: 

1) aoaia2 ■ ■ ■ is computable 

2) K{ao...an\n) = 0{l). 

3) \K{ao...an)-K{n)\<0{l). 

4) \K{ao...an)-log{n)\ < 0(1). 

Q: Nice results. Let me tell what I see. We know that K{x) < |x| + 0(1). 
Well, if we have the equality, K{ao...an) = n — 0(1), i.e. if maximum 
complexity occurs for all prefixes, then the sequence aoaia2 ■ ■ ■ should be 
random! Is it indeed the case? 

A: That's a very tempting idea. And Kolmogorov had also looked for such 
a characterization. Unfortunately, as Martin-Lof proved around 1965 (1966, 
[35j). there is no such sequence] It is a particular case of a more general 
result (just set f(n)=constant). 

Theorem 26 (Large oscillations, [35]). Let / : N ^ N 6e a computable 
function such that T,n£fq2~'^^"'^ = +00. Then, for every binary sequence 
aoaia2 ■ ■ ■ there are infinitely many n 's such that K{aQ . . . a„ | n) < n — f{n). 

Q: So, the bottom-up approach completely fails as concerns a characteri- 
zation of random sequences. Hum. . . But it does succeed as concerns com- 
putable sequences, which were already fairly well characterized. Funny! 

A: It's however possible to sandwich the set of Martin-Lof random sequences 
between two sets of probability one defined in terms of the K complexity of 
prefixes. 

Theorem 27 ([35]). Let / : N ^ N 6e computable such that the series 
5]2-/(") is computably convergent. Set 

X = {aQai . . . : K[aQ . . . a„ | n) > n — 0(1) for infinitely many n 's.} 

Yf = {dQai . . . : K{aQ . . . a„ | n) > n — f{n) for all but finitely many n 's.} 
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Denote ML the set of Martin-Lof random sequences. Then X and Yj have 
probability one and X C ML C Yj. 

NB: Proper inclusions have been proved by Peter Schnorr, 1971 |39j (see 
also [30] 2.5.15 p.l54). 

Let's illustrate this theorem on an easy and spectacular corollary which 
uses the fact that 2-2i°s(") = ^ and that the series S-^ is computably con- 
vergent: if K{aQ . . .On I n) > n — c for infinitely many n 's then K{ao . . .an \ 
n) >n — 21og(n) for all but finitely many n's. 

8.5 Bottom-up approach with prefix complexity 

Q: What about considering prefix Kolmogorov complexity? 

A: Kolmogorov's original idea does work with prefix Kolmogorov complex- 
ity. This has been proved by Claus Peter Schnorr (1974, unpublished, cf. 
[8] Remark p. 106, and p!Q] p. 135-137 for a proof). 

Robert M. Solovay, 1974 (unpublished [43]) strengthened Schnorr 's result 
(cf. [lO] p. 137-139). 

Theorem 28. The following conditions are equivalent: 

1) aoaia2 . . . is Martin-Lof ramdom random 

2) H{aQ ...an) >n- 0(1) for all n. 

3) lim„_+oo(-H'(ao • • • On) - ") = +00. 

4) For any c.e. sequence {Ai)i of open subsets o/{0, 1}^ if IliProba{Ai) < 
+00 then 090102 • • • belongs to finitely many Ai 's. 

These equivalences stress the robustness of the notion of Martin-Lof ramdom 
sequence. 

8.6 Top-down/Bottom-up approaches: a sum up 

Q: I get somewhat confused with these two approaches. Could you sum up. 

A: The top-down and bottom-up approaches both work and lead to the 
very same class of random sequences. 

Kolmogorov looked at the bottom-up approach from the very beginning in 
1964. But nothing was possible with the original Kolmogorov complexity, 
Levin-Chaitin's variant H was needed. 

Q: Ten years later. . . 

A: As for the top-down approach, it was pionneered by von Mises since 
1919 and made successful by Martin-Lof in 1965. Martin-Lof had to give 
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up von Mises frequency tests. However, Kolmogorov was much interested 
by these frequency tests ([22J), and he refined them in a very clever way 
with the purpose to recover Martin-Lof randomness, which lead him to the 
notion of Kolmogorov stochastic randomness. Unfortunately, up to now, we 
only know that 

Martin-Lof random =^=- stochastic random von Mises-Church random. 
The second implication is known to be strict but not the first one. Would it 
be an equivalence, this would give a quite vivid characterization of random 
sequences via much concrete tests. 

8.7 Randomness with other probability distributions 

Q: All this is relative to the uniform probability distribution. Can it be 
extended to arbitrary probability distributions? 

A: Not arbitrary probability distributions, but computable Borel ones: 
those distributions P such that the sequence of reals (-P(-^u))ne{o,i}* (where 
lu is the set of infinite sequences which extend u) is computable, i.e. there 
is a computable function / :{0,l}*xN^Q such that 

\P{Iu)-fiu,n)\<^. 
Martin-Lof 's definition of random sequences extends trivially. As for char- 
acterizations with variants of Kolmogorov complexity, one has to replace the 
length of a finite string u by the quantity —log{P{Iu))- 

8.8 Chaitin's real n 

Q: I read a lot of things about Chaitin's real 

A: Gregory Chaitin, 1987 [11], explicited a spectacular random real and 
made it very popular. 

Consider a universal prefix partial recursive function U and let be the 
Lebesgue measure of the set 

{a G {0, 1}^ I 3n U{a \n) is defined} 

Q: Seems to be an avatar of the halting problem. 

A: Indeed. It is the probability that, on an infinite input, the machine 
which computes U halts in finite time (hence after reading a finite prefix of 
its input). 

= S{2"IPl I U halts on input p] (15) 
Theorem 29. The binary expansion of is Martin-Lof ramdom. 
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Q: How does one prove that 0, is ramdom? 

A: U has prefix-free domain, hence O = S{2~IpI | p G domain{U)} < 1. 
Any halting program with length n contributes for exactly 2"" to fi. Thus, 
if you know the first k digits of then you know the number of halting 
programs with length < k. From this number, by dovetailing, you can get 
the list of the halting programs with length < k (cf. ^3.1113. 2j) . Having these 
programs, you can get the first string u which is not the output of such a 
program. Clearly, H(u) > k. Now, u is computably obtained from the 
first k digits of so that by Proposition [7] we have H{u) < H{loq . . . ujk) + 
0(1). Whence H{ujq . . .tOk) > k + 0{1), which is condition 2 of Theorem 
[28] (Schnorr condition). This proves that the binary expansion of O is a 
Martin-Lof ramdom sequence. 

Q: seems to depend on the universal machine. 

A: Sure. We can speak of the class of Chaitin reals: those reals which 
express the halting probability of some universal prefix programming lan- 
guage. 

Cristian Calude & Peter Hertling &; Bakhadyr Khoussainov & Yongge Wang, 
1998 [4] (cf. also Antonin Kucera & Theodore Slaman, 2000 [27]) proved 
a very beautiful result: r is a Chaitin real if and only if (the binary 
development of) r is Martin-Lof random and r computably enumerable from 
below (i.e. the set of rational numbers < r is c.e.). 

Q: I read that this real has incredible properties. 

A: This real has a very simple and appealing definition. Moreover, as we 
just noticed, there is a simple way to get all size n halting programs from 
its n first digits. This leads to many consequences due to the following fact: 
any statement of the form 3x^(x) (where $ is a computable relation) 
is equivalent to a statement insuring that a certain program halts, and this 
program is about the same size as the statement. Now, deciding the truth 
of S5 statements is the same as deciding that of H^ statements. 
And significant statements abound! Like Fermat's last theorem (which 
is now Wiles' theorem) or consistency statements. This is why Chaitin says 
Q is the "Wisdom real". 

Other properties of are common to all reals which have Martin-Lof 
ramdom binary expansions. For instance, transcendance and the fact that 
any theory can give us but finitely many digits. 

Hum. . . About that last point, using Kleene's recursion theorem, Robert 
Solovay, 1999 [45j, proved that there are particular Chaitin 0, reals about 
which a given theory can not predict any single bit! 
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8.9 Non computable invEiriance 

Q: In some sense, Martin-L6f ramdomness is a part of recursion theory. Do 
random sequences form a Turing degree or a family of Turing degrees? 

A: Oh, no! Randomness is definitely not computable invariant. It's in fact 
a very fragile notion: quite insignificant modifications destroy randomness. 
This makes objects like Q so special. 

Let's illustrate this point on an example. Suppose you transform a random 
sequence 00010203 . . . into ao0ai0o20a30 . . . The sequence you obtain has 
the same Turing degree as the original one, but it is no more random since 
its digits with odd ranks are all 0. A random sequence has to be random 
everywhere. Hum . . . for Martin-Lof random reals, I should rather say "every 
c.e. where". 

Q: Every what? 

A: "Every c.e. where". I mean that if / is a computable function from 
N into N (in other words, a computable enumeration of an c.e. set) then 
the sequence of digits with ranks /(O), /(I), /(2), . . . of a Martin-Lof ran- 
dom sequence has to be Martin-Lof random. In fact, you recognize here an 
extraction process a la von Mises for which a random sequence should give 
another random sequence. 

Q: OK. What about many-one degrees? 

A: Same. Let's represent a binary infinite sequence a by the set Xa of 
positions of digits 1 in a. Then, 

n G Xaoaia2... 2n G ^aoOaiOa20... 

Also, let if(2n) = n and ip{2n + 1) = k where k is some fixed rank such that 
Ofc = 0, then 

€ XaQ0ai0a20ldots <^ ^{n) € -^^000102... 

These two equivalences prove that -^aoaia2... -^aoOaiOa20... are many-one 
equivalent. 

9 More randomness 

There are more things in heaven and earth, Horatio, 
Than are dreamt of in your philosophy. 

Hamlet, William Shakespeare 
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9.1 Beyond c.e.: oracles and infinite computations 



Q: Are there other random reals than Chaitin Q reals ? 

A: Sure. Just replace in Martin-Lof's definition the computable enumer- 
ability condition by a more complex one. For instance, you can consider 
sets, which amounts to computable enumerability with oracle 0' (the set 
which encodes the halting problem for Turing machines). 

Q: Wait, wait. Just a minute ago, you said that for all classical probability 
laws, c.e. open sets, i.e. sets, are the ones which come in when proving 
that the exception set to the law has probability zero. So, what could be 
the use of such generalizations? 

A: Clearly, the more random sequences you have which satisfy classical 
probability laws, the more you strengthen these theorems as we said earlier. 
In this sense, it is better to stick to Martin-Lof's definition. But you can 
also want to consider random sequences as worst objects to use in some con- 
text. Depending on that context, you can be lead to ask for much complex 
randomness conditions. 

Also, you can have some very natural objects much alike Chaitin real 
which can be more complex. 

Q: Be kind, give an example! 



A: In a recent paper, 2001 pj, Veronica Becher & Chaitin & Sergio Daicz 
consider the probability that a prefix universal programming language pro- 
duces a finite output, though possibly running indefinitely. They prove that 
this probability is an Chaitin real, i.e. its binary expansion is 0'-random. 
Becher & Chaitin, 2002 [l], consider the probability for the output to rep- 
resent a cofinite set of integers, relatively to some coding of sets of integers 
by sequences. They prove it to be an 0" Chaitin real. 

Such reals are as much appealing and remarkable as Chaitin's real Q and 
also they are logically more complex. 

9.2 Far beyond: Solovay random reals in set theory 

Q: I heard about Solovay random reals in set theory. Has it anything to do 
with Martin-Lof random reals? 

A: Hum... These notions come from very different contexts. But well, 
there is a relation: proper inclusion. Every Solovay random real is Martin- 
Lof random. The converse being far from true. In fact, these notions of 
randomness are two extreme notions of randomness. Martin-Lof randomness 
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is the weakest condition whereas Solovay randomness is really the strongest 
one. So big indeed that for Solovay reals you need to work in set theory, not 
merely in recursion theory, and even worse, you have to consider two models 
of set theory, say Mi and an inner submodel M2 with the same ordinals. . . 

Q: You mean transfinite ordinals? 

A: Yes, 0, 1, 2, 3, . . . , w, w + 1, u; + 2, . . . , a; + a; (which is u!.2) an so on: 
(J. 3, . . . , uj.uj (which is uP') , . . . , w^, . . . , OJ^ , ■ ■ ■ 

In a model of set theory, you have reals and may consider Borel sets, i.e. sets 
obtained from rational intervals via iterated countable unions and countable 
intersections. 

Thus, you have reals in Mi and reals in M2 and every M2 real is also in 
Ml. You also have Borel sets defined in M2. And to each such Borel set 
X2 corresponds a Borel set Xi in Mi with the same definition (well, some 
work is necessary to get a precise meaning, but it's somewhat intuitive). 
One can show that X2 C Xi and that Ai, A2 have the very same measure, 
which is necessarily a real in M2. Such a Borel set Ai in Mi will be called 
a M2-coded Borel set. 

Now, a real r in Mi is Solovay random over M2 if it lies in no measure zero 
M2-coded Borel set of Mi. Such a real r can not lie in the inner model M2 
because {r} is a measure zero Borel set and if r were in M2 then {r} would 
be M2-coded and r should be outside it, a contradiction. 
In case Mi is big enough relative to M2 it can contain reals which are Solovay 
random over M2. It's a rather tough subject, but you see: 

— Martin-Lof random reals are reals outside all c.e. sets (i.e. intersection 
of an c.e. sequence of open sets) constructively of measure zero. In other 
words, outside a very smooth countable family of Borel sets. Such Borel sets 
are, in fact, coded in any inner submodel of set theory. 

— Solovay random reals over a submodel of set theory are reals outside every 
measure zero Borel set coded in that submodel. Thus Solovay random reals 
can not be in the inner submodel. They may or may not exist, depending 
on how big is Mi relative to M2. 

Q: What a strange theory. What about the motivations? 

A: Solovay introduced random reals in set theory at the pionneering time 
of independence results in set theory, using the method of forcing invented 
by Paul J. Cohen. That was in the 60's. He used them to get a model of 
set theory in which every set of reals is Lebesgue measurable |44] . 

Q: Wow! it's getting late. 
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A: Hope you are not exhausted. 

Q: I really enjoyed talking with you on such a topic. 

Note. The best reference to the subject are 

• Li & Vitanyi's book [30] 

• Downey & Hirschfeldt's book |13j 

Caution: In these two books, C, K denote what is here - and in many papers 
- denoted if, H. 

Among other very useful references: [3], [12], [T7], [IQ] and [i6] . 

Gregory Chaitin's papers are available on his home page. 

References 

[1] V. Becher and G. Chaitin. Another example of higher order random- 
ness. Fund. Inform., 51(4):325-338, 2002. 

[2] V. Becher, G. Chaitin, and S. Daicz. A highly random number. In C.S. 
Calude, M.J. Dineen, and S. Sburlan, editors. Proceedings of the Third 
Discrete Mathematics and Theoretical Computer Science Conference 
(DMTCS'Ol), pages 55-68. Springer- Verlag, 2001. 

[3] C. Calude. Information and randomness. Springer, 1994. 

[4] C.S. Calude, P.H. Hertling, and B. Khoussainov Y. Wang. Recursively 
enumerable reals and Chaitin numbers. In STACS 98 (Paris, 1998), 
number 1373 in Lecture Notes in Computer Science, pages 596-606. 
Springer- Verlag, 1998. 

[5] G. Chaitin. On the length of programs for computing finite binary 
sequences. J. Assoc. Comput. Mach., 13:547-569, 1966. 

[6] G. Chaitin. On the length of programs for computing finite binary 
sequences: statistical considerations. J. Assoc. Comput. Mach., 16:145- 
159, 1969. 

[7] G. Chaitin. Computational complexity and Gdel incompleteness theo- 
rem. ACM SIGACT News, 9:11-12, 1971. Available on Chaitin's home 
page. 



39 



[8] G. Chaitin. A theory of program size formally identical to information 
theory. Journal of the ACM, 22:329-340, 1975. Available on Chaitin's 
home page. 

[9] G. Chaitin. Information theoretic characterizations of infinite strings. 
Theoret. Comput. Sci., 2:45-48, 1976. Available on Chaitin's home 
page. 

[10] G. Chaitin. Algorithmic Information Theory. Cambridge University 
Press, 1987. 

[11] G. Chaitin. Incompleteness theorems for random reals. Advances in 
Applied Math., pages 119-146, 1987. Available on Chaitin's home page. 

[12] J. P. Delahaye. Information, complexite, hasard. Hermes, 1999 (2d 
edition). 

[13] R. Downey and D. Hirschfeldt. Algorithmic randomness and complexity. 
Springer, 2006. To appear. 

[14] W. Feller. Introduction to probability theory and its applications, vol- 
ume 1. John Wiley, 1968 (3d edition). 

[15] M. Ferbus and S. Grigorieff. Kolmogorov complexities k^m, ^max on 
computable partially ordered sets. Theoret. Comput. ^ci., 352:159-180, 
2006. 

[16] M. Ferbus and S. Grigorieff. Kolmogorov complexity and set theoretical 
representations of integers. Math. Logic Quarterly, 52(4):381-409, 2006. 

[17] P. Gacs. Lectures notes on descriptional complexity 

and randomness. Boston University, pages 1-67, 1993. 

http : / / cs-pub .bu.edu / faculty / gacs /Home.html 

[18] S. Grigorieff and J.Y. Marion. Kolmogorov complexity and non- 
determinism. Theoret. Comput. Sci., 271:151-180, 2002. 

[19] Y. Gurevich. The Logic in Computer Science Column: On Kol- 
mogorov machines and related issues. Bull. EATCS, 35:71-82, 1988. 
http : // research . microsoft . com / ~ gurevich / 1 paper 78. 

[20] D. Knuth. The Art of Computer Programming. Volume 2: semi- 
numerical algorithms. Addison- Wesley, 1981 (2d edition). 



40 



[21] A.N. Kolmogorov. Grundbegriffe der Wahscheinlichkeitsrechnung. 
Springer- Verlag, 1933. English translation 'Foundations of the Theory 
of Probability', Chelsea, 1956. 

[22] A.N. Kolmogorov. On tables of random numbers. Sankhya, The Indian 
Journal of Statistics, ser. A, 25:369-376, 1963. 

[23] A.N. Kolmogorov. Three approaches to the quantitative definition of 
information. Problems Inform. Transmission, l(l):l-7, 1965. 

[24] A.N. Kolmogorov. Some theorems about algorithmic entropy and algo- 
rithmic information. Uspekhi Mat. Nauk, 23(2):201, 1968. (in russian). 

[25] A.N. Kolmogorov. Combinatorial foundation of information theory and 
the calculus of probability. Russian Math. Surveys, 38(4):29-40, 1983. 

[26] A.N. Kolmogorov and V. Uspensky. Algorithms and randomness. SIAM 
J. Theory Probab. AppL, 32:389-412, 1987. 

[27] A. Kucera and T.A. Slaman. Randomness and recursive enumerability. 
SIAM J. on Computing, 2001. to appear. 

[28] L. Levin. On the notion of random sequence. Soviet Math. Dokl., 
14(5):1413-1416, 1973. 

[29] L. Levin. Random conservation inequalities; information and indepen- 
dence in mathematical theories. Information and Control, 61:15-37, 
1984. 

[30] M. Li and P. Vitanyi. An introduction to Kolmogorov complexity and 
its applications. Springer, 1997 (2d edition). 

[31] D. Loveland. A new interpretation of von Mises's concept of random 
sequence. Z. Math. Logik und Grundlagen Math., 12:279-294, 1966. 

[32] D. Loveland. A variant of the Kolmogorov concept of complexity. In- 
formation and Control, 15:510-526, 1969. 

[33] Michael Machtcy and Paul Young. An introduction to the general theory 
of algorithms. North-Holland, New York, 1978. 

[34] P. Martin-Lof. The definition of random sequences. Information and 
Control, 9:602-619, 1966. 



41 



[35] P. Martin-Lof. Complexity of oscilations in infinite binary sequences. 
Z. Wahrscheinlichkeitstheorie verw. Geb., 19:225-230, 1971. 

[36] J. Miller and L. Yu. On initial segment complexity and degrees of 
randomness. Trans. Amer. Math. Soc. to appear. 

[37] W. Paul. Kolmogorov's complexity and lower bounds. In L. Budach, ed- 
itor, Proc. 2nd Int. Conf. Fundamentals of Computation Theory, pages 
325-334. Akademie Verlag, 1979. 

[38] B. Russell. Mathematical logic as based on the theory of types. Amer. 
J. Math., 30:222-262, 1908. Reprinted in 'From Frege to Gdel A source 
book in mathematical logic, 1879-1931', J. van Heijenoort ed., p. 150- 
182, 1967. 

[39] P. Schnorr. A unified approach to the definition of random sequences. 
Math. Systems Theory, 5:246-258, 1971. 

[40] A. Shen. Kolmogorov complexity and its applications. Lec- 
ture Notes, Uppsala University, Sweden, pages 1-23, 2000. 
|http : // www, csd .uu.se / ~ vor obyov / 1 Courses /KC / 2000 / all.ps. 

[41] A. Shen and V. Uspensky. Relations between varieties of Kolmogorov 
complexities. Mathematical systems theory, 29:271-292, 1996. 

[42] R. Solomonoff. A formal theory of inductive inference, part 1. Infor- 
mation and control, 7:1-22, 1965. 

[43] R.M. Solovay. Draft of paper (or series of papers) on chaitin's work, 
done for the most part during the period of sept.-dec. 1974. Unpublished 
manuscript, IBM Thomas Watson Research Center, Yorktown Heights, 
NY. 

[44] R.M. Solovay. A model of set theory in which every set of reals is 
Lebesgue measurable. Annals of Mathematics, 92:1-56, 1970. 

[45] R.M. Solovay. A version of 0, for which ZFC can not 
predict a single bit. Centre for Discrete Math and 

Camp. Sc., Auckland, New Zealand, 104:1-11, 1999. 
http : // www. cs . auckland . ac . nz/ st aff-cgi-bin /mj d/ secondcgi . pi 

[46] V.A Uspensky, A.L Semenov, and A.Kh Shen. Can an individual se- 
quence of zeros and ones be random. Russian Math. Surveys, 41(1):121- 
189, 1990. 



42 



[47] J. Ville. Etude critique de la notion de Collectif. Gauthier-Villars, 1939. 

[48] R. von Mises. Grundlagen der wahrscheinlichkeitsrechnung. Mathemat. 
Zeitsch., 5:52-99, 1919. 

[49] R. von Mises. Probability, Statistics and Truth. Macmillan, 1939. 
Reprinted: Dover, 1981. 

[50] A. Zvonkin and L. Levin. The complexity of finite objects and the 
development of the concepts of information and randomness by means 
of the theory of algorithms. Russian Math. Surveys, 6:83-124, 1970. 



43 



